On the Problem of 'aboutness' in Document Analysis
نویسنده
چکیده
One of the most crucial problem areas of information science concerns the identification of what documents are 'about'. This paper seeks to define the notion of 'aboutness ' within the context of recent work in text linguistics. It describes, first, the essential communicational structures of sentences, paragraphs and texts in terms of theme, rheme and thematic progression, connectors of clauses and sentences, and semantic progression. It then identifies the basic features of the global structures of narrative and expository texts, describes the interaction of macroand microstructure in the interpretation of texts and the role of presupposed 'states of knowledge ' in both text production and text comprehension. Finally, it is argued that for the purposes of information systems the 'aboutness' of documents is to be found among the presuppositions of authors concerning the knowledge of their potential readers.
منابع مشابه
An Implementation of Symbolic Aboutness Theory
Today information can be globally shared via the Internet and can be accessible from anywhere in the world. The increasing complexity and size of the WWW urges the need of more effective mode for information processing techniques such as information retrieval and filtering, information summarization, topic segmentation, data mining and information discovery, etc. All of them can be fundamentall...
متن کاملHow nonmonotonic is Aboutness ?
The notion of aboutness is fundamental to information retrieval. Assume there is a document d which is about query q. Now, if information is added to d yielding ~ d, the question arises whether document ~ d is about q? In other words, is aboutness monotonic with respect to information composition? This article shows that aboutness does have nonmonotonic character with respect to composition.
متن کاملHow Nonmonotonic Is Aboutness? How Nonmonotonic Is Aboutness?
The notion of aboutness is fundamental to information retrieval. Assume there is a document d which is about query q. Now, if information is added to d yielding ~ d, the question arises whether document ~ d is about q? In other words, is aboutness monotonic with respect to information composition? This article shows that aboutness does have nonmonotonic character with respect to composition.
متن کاملSalience-Based Content Characterisation Of Text Documents
Summarisation is poised to become a generally accepted solution to the larger problem of content analysis. We offer an alternative perspective on this problem, by tackling the complementary task of content characterisation; our motivation for doing so is to avoid some of the fundamental shortcomings of summarisation technologies today. Traditionally, the document summarisation task has been tac...
متن کاملDeciding Term Aboutness Probabilistically
Information retrieval is the quest to nd those information objects relevant to a given information need. Relevance is a diicult notion to deene operationally. As a consequence information retrieval mechanisms are typically driven by the decision of when one information carrier (e.g. a document) is about another (e.g. a query). As documents and queries are typically complex representations built...
متن کامل